Convergence of Optimistic and Incremental Q-Learning
نویسندگان
چکیده
Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algorithm incremental Q-learning, which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the E-greedy Q-learning.
منابع مشابه
A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System
In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...
متن کاملOn the Convergence of Optimistic Policy Iteration
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estim...
متن کاملTowards Finite-Sample Convergence of Direct Reinforcement Learning
While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning ...
متن کاملFurther study on $L$-fuzzy Q-convergence structures
In this paper, we discuss the equivalent conditions of pretopological and topological $L$-fuzzy Q-convergence structures and define $T_{0},~T_{1},~T_{2}$-separation axioms in $L$-fuzzy Q-convergence space. {Furthermore, $L$-ordered Q-convergence structure is introduced and its relation with $L$-fuzzy Q-convergence structure is studied in a categorical sense}.
متن کاملStratified $(L,M)$-fuzzy Q-convergence spaces
This paper presents the concepts of $(L,M)$-fuzzy Q-convergence spaces and stratified $(L,M)$-fuzzy Q-convergence spaces. It is shown that the category of stratified $(L,M)$-fuzzy Q-convergence spaces is a bireflective subcategory of the category of $(L,M)$-fuzzy Q-convergence spaces, and the former is a Cartesian-closed topological category. Also, it is proved that the category of stratified $...
متن کامل